When the Mind’s Ear Doesn’t Hear what the Brain’s Ear Does: McGurk Effect Clues to the Role of Lexical Information in Spoken Word Recognition

نویسندگان

  • Arthur G. Samuel
  • Jerrold Lieblich
  • Arthur Samuel
چکیده

A model of spoken word recognition must specify the types of representations involved in the process, and the ways that information flows among them. Prior research has shown that lexical activation can produce phonemic activation that is “real” enough for the phonemes to affect the perception of later speech sounds through selective adaptation. In contrast, similar studies have found that the phonemic percepts generated by the (audiovisual) McGurk effect do not produce adaptation. The current study adds lexical support to McGurk-based adaptors, and replicates their inability to generate adaptation. Instead, purely visual adaptors produced adaptation effects on auditory test syllables. The results from the current and previous studies can be understood with two assumptions: (1) Lexical information actively supports the perception of phonemic codes when the signal is underspecified in some way (e.g., noisy, ambiguous, missing), and (2) Lexical information cannot affect lower-level perception when the bottom-up signal is strong. 3 Lexical McGurk Adaptation Spoken word recognition is a fundamental yet complex human ability. The speech signal is acoustically highly variable, making word recognition a very difficult problem, as shown by the limited success of machine-based systems. There is evidence that one way that humans solve this problem is by combining the acoustic information available in the signal with other potential sources of information. For example, lexical (word) information can influence how people hear speech sounds (phonemes). A striking example of such an effect is the phonemic restoration illusion (Warren, 1970): When part of a word is removed from the signal, and replaced by an extraneous sound (e.g., white noise), listeners consistently report that the speech sounds intact – they seem to perceptually restore the missing speech sound. Samuel (1981, 1996) has shown that the strength of restoration varies with the strength of the lexical context, indicating that lexical information plays a critical role in generating the percept of the missing phonemic information. A similar lexical influence was first reported by Ganong (1980), who showed that when phonetic information is intentionally made ambiguous, listeners’ percepts are strongly influenced by the lexical context. For example, if a sound is midway between /d/ and /t/, it will be reported as /d/ if the following context is “ash” ( “dash”), but as /t/ if “ask” follows ( “task”), because these are lexical interpretations and the alternatives (“tash”, “dask”) are not. A second type of potentially supportive information is provided by the visual information in a speaker’s face. The performance of skilled lipreaders illustrates the availability of this information. McGurk and MacDonald (1976) demonstrated that even untrained 4 Lexical McGurk Adaptation listeners use such visual cues. These authors presented subjects with audiovisual stimuli in which mismatching sound tracks were dubbed onto the videos. For example, a video of a face saying /ga/ could be paired with the audio signal from /ba/. Listeners often report that they hear /da/ under these circumstances, a response that is clearly heavily influenced by the visual input, even though the instructions emphasize the need to report what is actually heard. A limitation of all of these results is that they rely on listeners’ subjective reports of what they hear. Some researchers (e.g., Norris, McQueen, & Cutler, 2000) have argued that the contextual effects could reflect decision-level influences, rather than true perceptual effects. Theorists who favor such fully bottom-up models point to a potentially fatal problem with systems that include top-down perceptual mechanisms: In theory such systems could lead people to hallucinate; perceiving what one expects, rather than what is present in the signal, can be disastrous. In the current study, we examine this issue, and provide data that support a perceptual system that can simultaneously gain the advantages offered by lexical support of phonemic perception while avoiding hallucination. Our results specify a critical constraint on the operation of top-down mechanisms (cf. Mirman, McClelland, & Holt, 2005). As we noted, most studies of contextual influences in spoken word recognition depend on subjective reports that are potentially subject to decision level interpretations. One way to address this concern is to look for evidence of such contextual effects under conditions in which the listener does not make decisions about the speech itself; rather, 5 Lexical McGurk Adaptation one looks for consequential effects that should occur if the listener had indeed perceived the speech in accord with the context. A very useful paradigm for this type of consequential test is the selective adaptation paradigm. In an adaptation experiment, listeners first identify members of a set of syllables that comprise a continuum (e.g., with /da/ at one end, and /ta/ at the other). After producing such a baseline measure of how they hear these syllables, the listeners go through an adaptation phase. In this phase, a sound (the “adaptor”) is played repeatedly, with occasional breaks during which listeners again identify syllables from the test series. As Eimas and Corbit (1973) originally showed, adaptation produces a contrastive effect, changing how people identify the test syllables. For example, if /da/ is the adaptor, fewer test syllables will be identified as /da/ after adaptation than on the baseline; if /ta/ is the adaptor, there will be fewer reports of /ta/. Samuel (1997, 2001) has used adaptation to establish the perceptual reality of lexicallydriven phonemes. Listeners in Samuel (1997) first identified /bI/-/dI/ test syllables. Adaptation was then conducted, using words in which either /d/ (e.g., “armadillo”) or /b/ (e.g., “exhibition”) had been replaced by white noise. Even though these adaptors did not have any acoustic basis for /d/ or /b/, they produced reliable adaptation shifts, showing that phonemic restoration had produced functional phonemes. Samuel (2001) found a comparable result for adaptors based on the Ganong effect: Words whose final “sh” (e.g., “abolish”) or “s” (e.g., “arthritis”) was replaced by a sound midway between “s” and “sh” produced adaptation shifts on “iss”-“ish” test syllables. These results show that lexical representations can generate true percepts of their component phonemes. 6 Lexical McGurk Adaptation Roberts and Summerfield (1981) used the same approach to test the perceptual status of audiovisually-determined percepts. They had subjects identify /bε/-dε/ test syllables, before and after adaptation. The critical adaptor was presented audiovisually, comprised of a visual /gε/ paired with an auditory /bε/, which was heard as /dε/. Unlike the lexical cases, this procedure did not produce adaptation based on the contextuallydetermined percept (/dε/); instead, the shifts were identical to those found with /bε/ (the auditory component of the audiovisual adaptor). A control condition with only the video (silent /gε/) produced no adaptation at all. Saldaña and Rosenblum (1994) conducted a follow-up to this study, using improved stimuli and procedures, and replicated all of the results: the audiovisual adaptor acted just like the purely auditory one, and the purely visual adaptor did nothing. The literature thus provides conflicting results. Lexical context produces percepts that can sustain adaptation, but audiovisual context does not, even though both types of context create persuasive subjective experiences. We have conducted a series of experiments designed to reconcile this conflict. More generally, the study is intended to clarify when context will produce true perceptual consequences, and when it will not, and in so doing delineate the relationship between bottom-up and top-down processing in spoken word recognition. The experiments were generated by the notion that it might be possible to boost the audiovisual context effect by giving it lexical support. The two studies that found no adaptation by the audiovisually-determined percept both used nonlexical simple consonant-vowel stimuli. Recent work (Brancazio, 2004) has shown 7 Lexical McGurk Adaptation that the “McGurk” percept is strengthened when the audiovisual combination yields a real word than when it does not. Given this, it may be possible to produce an adaptation effect with audiovisual adaptors, if they correspond to real words. To maximize comparability to the lexical literature, we modeled our stimuli closely on those used in Samuel’s (1997) lexical adaptation study. But, rather than producing a /d/ adaptor by putting white noise in place of the /d/ in words like “armadillo”, we produced the /d/ by pairing a visual “armagillo” with an auditory “armabillo”. If the lexical support for this audiovisual percept is strong enough, it could produce /d/ adaptation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lateralization of lexical codes in auditory word recognition.

Three experiments examined the lateralization of lexical codes in auditory word recognition. In Experiment 1 a word rhyming with a binaurally presented cue word was detected faster when the cue and target were spelled similarly than when they were spelled differently. This orthography effect was larger when the target was presented to the right ear than when it was presented to the left ear. Ex...

متن کامل

Lexical influences on the McGurk effect

The purpose of this research was to explore the interrelationship of audiovisual speech perception and spoken word recognition. I tested whether an index of audiovisual integration, the “McGurk effect,” would be influenced by the lexical status of the stimuli. There was a significant increase in the McGurk effect when the visually-influenced percept formed a word than when it formed a nonword, ...

متن کامل

Lexical effects on dichotic word recognition in young and elderly listeners.

Dichotic listening was evaluated using monosyllabic word pairs that differed in lexical difficulty as defined by the Neighborhood Activation Model of spoken word recognition. Four combinations of lexically EASY and lexically HARD words were evaluated (same pair: EASY-EASY, HARD-HARD; or mixed pair: EASY-HARD, HARD-EASY) in young adult listeners with normal hearing and older adult listeners with...

متن کامل

Running head: HEMISPHERIC DIFFERENCES IN SPOKEN WORD RECOGNITION Examining potential hemispheric differences in talker effects in spoken word recognition

Variability in talker identity, one type of indexical variation, has demonstrable effects on the speed and accuracy of spoken word recognition. Furthermore, variability in visual word recognition, such as changes in font, appears to affect processing differently, depending on which cerebral hemisphere initially processes the input. The present study examined whether such hemispheric differences...

متن کامل

Quantitative evaluation of lexical status, word frequency, and neighborhood density as context effects in spoken word recognition.

Listeners identified a phonetically balanced set of consonant-vowel-consonant (CVC) words and nonsense syllables in noise at four signal-to-noise ratios. The identification scores for phonemes and syllables were analyzed using the j-factor model [Boothroyd and Nittrouer, J. Acoust. Soc. Am. 84, 101-114 (1988)], which measures the perceptual independence of the parts of a whole. Results indicate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006